Principal component methods - hierarchical clustering - partitional clustering : why would we need to choose for visualizing data ?
نویسندگان
چکیده
This paper combines three exploratory data analysis methods, principal component methods, hierarchical clustering and partitioning, to enrich the description of the data. Principal component methods are used as preprocessing step for the clustering in order to denoise the data, transform categorical data in continuous ones or balanced groups of variables. The principal component representation is also used to visualize the hierarchical tree and/or the partition in a 3D-map which allows to better understand the data. The proposed methodology is available in the HCPC (Hierarchical Clustering on Principal Components) function of the FactoMineR package.
منابع مشابه
clustering - partitional clustering : why would we need to choose for visualizing data ?
This paper combines three exploratory data analysis methods, principal component methods, hierarchical clustering and partitioning, to enrich the description of the data. Principal component methods are used as preprocessing step for the clustering in order to denoise the data, transform categorical data in continuous ones or balanced groups of variables. The principal component representation ...
متن کاملUsing Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors
Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...
متن کاملChoosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation
1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...
متن کاملمقایسه نتایج خوشهبندی سلسله مراتبی و غیرسلسله مراتبی پروتئینهای مرتبط با سرطانهای مری، معده و کلون براساس تشابهات تفسیر هستیشناسی ژنی
Background and Objective: Using proteomic methodologies and advent of high-throughput (HTP) investigation of proteins has created a need for new approaches in bioinformatics analysis of experimental results. Cluster analysis is a suitable statistical procedure that can be useful for analyzing these data sets. Materials and Methods: In this research study, the identified proteins associated wi...
متن کاملA new method for hierarchical clustering combination
In the field of pattern recognition, combining different classifiers into a robust classifier is a common approach for improving classification accuracy. Recently, this trend has also been used to improve clustering performance especially in non-hierarchical clustering approaches. Generally hierarchical clustering is preferred in comparison with the partitional clustering for applications when ...
متن کامل